Newest 'class-imbalance+training' Questions

0votes

1answer

287views

Aren't balanced data sets important in regression?

Why is it that the necessity for balanced data sets is (almost) always exclusively mentioned in the context of classification but not of regression?

Tfovid

195

asked Aug 11, 2021 at 15:20

1vote

0answers

91views

Is it right method to remove instances that are hard to predict before train test split?

In a binary classification problem, I have a slightly unbalanced medical dataset with class distribution: 0:5600, 1:1500 0 without a problem and 1 with a problem. I tried many pipelines, automls, and ...

DOT

113

asked Jun 12, 2021 at 13:14

0votes

0answers

287views

Train/ Test split on small dataset along with SMOTE

I have a binary classification imbalanced dataset with 1000 samples ( 15% of class 1, 85% of the rest). My main goal is to build a robust classifier using the following approach. Wanted to know if ...

Vardaan Khanted

23

asked May 5, 2021 at 17:26

1vote

1answer

975views

Test set larger than train set [closed]

There is a two class dataset with 1121 values in total, having 230 from same class and 891 from the other class. The training set is choosen as 230+230=460 from both classes and the test set as the ...

Jean

37

asked Jan 21, 2021 at 18:27

1vote

1answer

30views

Many questions training unbalanced and duplicated data

I'm a DS student. I have like 30.000 of bank statements, all labeled with a specific category(cat1, cat2, ...). With that data I'm trying to train a classification model but I found several problems: ...

Jack Fenn

21

asked Dec 18, 2020 at 15:10

-1votes

1answer

80views

using average precision as metric for imbalanced problem (learning curve example) [closed]

I have an imbalanced problem (2% target class) and therefore need an appropriate metric - so I chose average_precision. My code: ...

mathella

37

asked Dec 15, 2020 at 19:21

1vote

1answer

43views

[under/over]-sampling teaches model the wrong distribution?

TLDR: Will under/oversampling during the training phase teach the model the wrong distribution and adversely affect accuracy? Let us assume you want to train a classifier to differentiate between ...

Stephen Lasky

11

asked Aug 31, 2020 at 17:22

3votes

1answer

2kviews

While downsampling training data should we also downsample the validation data or retain validation split as it is?

I am dealing with class imbalance problem. In this case, I am down sampling the majority class lables in the training set. Among training, validation and test splits, the majority class in training ...

Ashwin Geet D'Sa

1,217

asked May 18, 2020 at 9:27

0votes

1answer

1kviews

splitting into train test by train_test_split of float values?

How to split into train test by train_test_split of float values ? I used LabelEncoder but I have about 300K lines and when I used the cross_val I saw ...

user10296606

1,896

asked Jan 6, 2020 at 5:03

6votes

2answers

6kviews

Resampling for imbalaced datasets: should testing set also be resampled?

Apologies for what is probably a basic question but I have not been able to find a definitive answer either in the literature or in the Internet. When dealing with an imbalanced dataset one possible ...

Jose Manuel Albornoz

63

asked Aug 20, 2019 at 14:03

6votes

2answers

595views

Why real-world output of my classifier has similar label ratio to training data?

I trained a neural network on balanced dataset, and it has good accuracy ~85%. But in real world positives appear in about 10% of the cases or less. When I test network on set with real world ...

Bien

63

asked Apr 7, 2019 at 12:10

2votes

2answers

123views

oversampling data with subclass

Oversampling of under-represented data is a way to combat class imbalance. For example, if we have a training data set with 100 data points of class A and 1000 data points of class B, we can over ...

chaohuang

191

asked Apr 1, 2019 at 8:00

1vote

3answers

4kviews

Downsampling and class ratios

My target variable is whether an application is accepted or not. It is a highly imbalanced target with 98.5% of applications accepted. I am unclear about the concept of downsampling. If I were to ...

Soorya Paturi

21

asked Nov 12, 2018 at 15:08

7votes

2answers

3kviews

How to fix class imbalance in training sample?

I was very recently asked in a job interview about solutions to fix an imbalance of classes in the training dataset. Let's focus on a binary classification case. I offered two solutions: oversampling ...

Learning is a mess

646

asked Feb 27, 2018 at 15:48

Stack Exchange Network

All Questions

Aren't balanced data sets important in regression?

Is it right method to remove instances that are hard to predict before train test split?

Train/ Test split on small dataset along with SMOTE

Test set larger than train set [closed]

Many questions training unbalanced and duplicated data

using average precision as metric for imbalanced problem (learning curve example) [closed]

[under/over]-sampling teaches model the wrong distribution?

While downsampling training data should we also downsample the validation data or retain validation split as it is?

splitting into train test by train_test_split of float values?

Resampling for imbalaced datasets: should testing set also be resampled?

Why real-world output of my classifier has similar label ratio to training data?

oversampling data with subclass

Downsampling and class ratios

How to fix class imbalance in training sample?

Hot Network Questions

All Questions

Related Tags